Day 16 - Regular expressions -

Multiple matches

A date gives you a corsage, not a multiple fracture.

Little Shop of Horrors (1986)

Well, not bad at all! We are still alive after 3 lessons about something that is considered an advanced

topic. Congrats! I hope you are not only surviving, but actually enjoying the journey. I think you

start to appreciate that regular expressions are not actually difficult, they are however complicated,

full of special symbols and rules. So far we learned how to use . for any character, square brackets

[ and ] for classes and ranges with - inside them, and finally the two anchors ^ and $.

Today we’ll have a look at multiple matches. Generally speaking a multiple match is a repeated

match of a previous regular expressions, and typical use case is when you need to match a specific

number of digits or letters, but multiple matches can also be less specific, for example matching an

indefinite number of lowercase letters.

Let’s start with exact matches, which are performed with the syntax {N}, where N is the number of

matches . As I said, all multiple matches operations refer to a previous regular expression, so if you

write

$ grep -E "a{2}" examples.txt

aardvark

you are asking grep to match all groups of 2 adjacent characters a, as in aardvark. The number

between brackets can be any positive number, even though using 1 makes no sense, as a single

character is already a regular expression matching one repetition of it. So, while you can execute

$ grep -E "a{1}" examples.txt

and get the correct result, this is equivalent to

$ grep -E "a" examples.txt

and I personally don’t see a point in making the regular expression more complex to read introducing

the braces. If you like complicating your life try to create a social network in PHP. Wait a minute,

what you mean they did it?

The braces repeat the previous regular expression component, so the syntax a{2} is equivalent to a

literal aa. The syntax can be used to repeat more than letters, though, as they apply to any previous

component of the regular expression. This command, for example